Extracting Keywords from Digital Document Collections
نویسنده
چکیده
An indexing tool was built to provide for one of several information seeking tasks. In ac cordance with the basic principles of work held by the HUMLE laboratory at SICS, a so lution regarding indexing would be a semi-automatic tool. This approach is also relevant as the continuation of the indexing project is conducted in co-operation with the Swedish Parliament, where a staff of professional indexers currently is investigating the utility of automatic and semi-automatic indexing tools to raise productivity.
منابع مشابه
Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملAletras, Nikolaos, Timothy Baldwin, Jey Han Lau and Mark Stevenson (to appear) Representing Topics Labels for Exploring Digital Libraries, In Proceedings of Digital Libraries 2014, London, UK
Topic models have been shown to be a useful way of representing the content of large document collections, for example via visualisation interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a set of keywords, i.e. the top-n words with highest marginal probability within the topic. However, alternativ...
متن کاملMining Technique Using Association Rules Extraction
automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions...
متن کاملMining Cross-document Relationships from Text
The paper argues that automatic link generation and typing methods are needed to find and maintain crossdocument links in large and growing textual collections. Such links are important to organise information and to support search and navigation. We present an experimental study on mining cross-document links from a collection of 5000 documents. We identify a set of link types and show that th...
متن کاملImproving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm
Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...
متن کامل